Datastore: implement barrier if we see "in failed tx" error. #1418
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For some error modes, Postgres will return an error to the caller & then fail all future statements within the same transaction with an "in failed SQL transaction" error. This effectively means one statement will receive a "root cause" error and then all later statements will receive an "in failed SQL transaction" error. In a pipelined scenario, if our code is processing the results of these statements concurrently--e.g. because they are part of a
try_join!
/try_join_all
group--we might receive & handle one of the "in failed SQL transaction" errors before we handle the "root cause" error, which might cause the "root cause" error's future to be cancelled before we evaluate it. If the "root cause" error would trigger a retry, this would mean we would skip a DB-based retry when one was warranted.To fix this problem, we (internally) wrap all direct DB operations in
run_op
. This function groups concurrent database operations into "operation groups", which allow us to wait for all operations in the group to complete (this waiting operation is called "draining"). If we ever observe an "in failed SQL transaction" error, we drain the operation group before returning. Under the assumption that the "root cause" error is concurrent with the "in failed SQL transactions" errors, this guarantees we will evaluate the "root cause" error for retry before any errors make their way out of the transaction code.Closes #1417.